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Amendments to the Claims : 

This listing of claims replaces all prior versions and listings of claims in the application: 

Listing of Claims : 
1-2. (Cancelled) 

3. (Currently amended) The method of claim [[1]] 99 further including recognizing a 
gesture associated with the object by analyzing changes in the position information of the object, 
and controlling the computer application based on the recognized gesture. 

4. (Original) The method of claim 3 further including: 
determining an application state of the computer application; and 
using the application state in recognizing the gesture. 

5. (Currently amended) The method of claim [[1]] 99 wherein the object is the user. 

6. (Currently amended) The method of claim [[1]] 99 wherein the object is a part of the 
user: 

7. (Currently amended) The method of claim [[1 ]] 99 further including providing feedback 
to the user relative to the computer application. 

8. (Currently amended) The method of claim [[1]] 99 wherein processing tho atoroo imag e 
to determine position information of the object further includ e s further including mapping the 
position information from position coordinates associated with the object to screen coordinates 
associated with the computer application. 
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9-10. (Cancelled) 

1 1 . (Currently amended) The method of claim [[9]] 99 whoroin proooooing tho otoroo imag e 
further includ e s further including : 

analyzing the scene description to identify a change in position of the object; and 
mapping the change in position of the object. 

12. (Currently amended) The method of claim [[9]] 100 wherein proooooing tho otoroo image 
tn pmHurn thn r.r.nnn description generating the scene descrip t ion from stereo video images 
further includes: 

processing the stereo image to identify matching pairs of features in the stereo image; and 
calculating a disparity and a position for each matching feature pair to create a scene 
description. 

1 3 . (Currently amended) The method of claim 1 2 wherein generating the scene description 
from stereo images includes : 

capturing the otoroo imago further includes capturing a reference image from a reference 
camera and a comparison image from a comparison camera; and 

processing tho otoroo imago furthor includes processing the reference image and the 
comparison image to create pairs of featm-fts for the scene description. . 

14. (Original) The method of claim 13 wherein processing the stereo image to identify 
matching pairs of features in the stereo image further includes: 

identifying features in the reference image; 

generating for each feature in the reference image a set of candidate matching features in 
the comparison image; and 
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producing a feature pair by selecting a best matching feature from the set of candidate 
matching features for each feature in the reference image. 

15. (Original) The method of claim 13 wherein processing the stereo image further includes 
filtering the reference image and the comparison image. 

16. (Original) The method of claim 14 wherein producing the feature pair further includes: 
calculating a match score and rank for each of the candidate matching features; and 
selecting the candidate matching feature with the highest match score to produce the 

feature pair. 

17. (Original) The method of claim 14 wherein generating for each feature in the reference 
image, a set of candidate matching features further includes; selecting candidate matching 
features from a predefined range in the comparison image. 

18. (Original) The method of claim 16 wherein feature pairs are eliminated based upon the 
match score of the candidate matching feature. 

19. (Original) The method of claim 18 wherein feature pairs are eliminated if the match 
score of the top ranking candidate matching feature is below a predefined threshold. 

20. (Original) The method of claim 18 wherein the feature pair is eliminated if the match 
score of the top ranking candidate matching feature is within a predefined threshold of the match 
score of a lower ranking candidate matching feature. 



21. 



(Original) The method of claim 16 wherein calculating the match score further includes: 
identifying those feature pairs that are neighboring; 
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adjusting the match score of feature pairs in proportion to the match score of neighboring 
candidate matching features at similar disparity; and 

selecting the candidate matching feature with the highest adjusted match score to create 
the feature pair. 

22. (Original) The method of claim 16 wherein feature pairs are eliminated by: 
applying the comparison image as the reference image and the reference image as the 

comparison image to produce a second set of feature pairs; and 

eliminating those feature pairs in the original set of feature pairs which do not have a 
corresponding feature pair in the second set of feature pairs. 

23. (Original) The method of claim 12 further comprising: 

for each feature pair in the scene description, calculating real world coordinates by 
transforming the disparity and position of each feature pair relative to the real world coordinates 
of the stereo image. 

24. (Original) The method of claim 14 wherein selecting features further includes dividing 
the reference image and the comparison image of the stereo image into blocks. 

25. (Original) The method of claim 24 wherein the feature is described by a pattern of 
luminance of the pixels contained with the blocks. 

26. (Original) The method of claim 24 wherein dividing further includes dividing the images 
into pixel blocks having a fixed size. 



27. 



(Original) The method of claim 26 wherein the pixel blocks are 8 x 8 pixel blocks. 
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28. (Currently amended) The method of claim [[10]] 99 wherein analyzing the scene 
description to determine the position information of the object further includes cropping the 
scene description to exclude feature information lying outside of object detection region. 

29. (Previously presented) The method of claim 28 wherein cropping further includes 
establishing a boundary of the object detection region. 

30. (Currently amended) The method of claim [[10]] 99 wherein analyzing the scene 
description to determine the position information of the object further includes: 

clustering the feature information in a region of interest into clusters having a collection 
of features by comparison to neighboring feature information within a predefined range; and 
calculating a position for each of the clusters. 

3 1 . (Original) The method of claim 30 further including eliminating those clusters having 
less than a predefined threshold of features. 

32. (Original) The method of claim 30 further including: 

selecting the position of the clusters that match a predefined criteria; 
recording the position of the clusters that match the predefined criteria as object position 
coordinates; and 

outputting the object position coordinates. 

33 . (Original) The method of claim 30 further including determining the presence of a user 
from the clusters by checking features within a presence detection region. 

34. (Original) The method of claim 32 wherein calculating the position for each of the 
clusters excludes those features in the clusters that are outside of an object detection region. 
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35. (Original) The method of claim 32 further including defining a dynamic object detection 
region based on the object position coordinates. 

36. (Original) The method of claim 35 wherein the dynamic object detection region is 
defined relative to a user's body. 

37. (Original) The method of claim 32 further including defining a body position detection 
region based on the object position coordinates. 

38. (Original) The method of claim 37 wherein defining the body position detection region 
further includes detecting a head position of the user. 

39. (Original) The method of claim 32 further including smoothing the motion of the object 
position coordinates to eliminate jitter between consecutive image frames. 

40. (Original) The method of claim 32 further including calculating hand orientation 
information from the object position coordinates. 

41 . (Original) The method of claim 40 wherein outputting the object position coordinates 
further includes outputting the hand orientation information. 

42. (Original) The method of claim 40 further including smoothing the changes in the hand 
orientation information. 

43. (Original) The method of claim 36 wherein defining the dynamic object detection region 
includes: 

identifying a position of a torso-divisioning plane from the collection of features; and 
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determining the position of a hand detection region relative to the torso-divisioning plane 
in the axis perpendicular to the torso divisioning plane. 

44. (Original) The method of claim 43 further including: 

identifying a body center position and a body boundary position from the collection of 
features; 

identifying a position indicating part of an arm of the user from the collection of features 
using the intersection of the feature pair cluster with the torso divisioning plane; and 

identifying the arm as either a left arm or a right arm using the arm position relative to the 
body position. 

45. (Original) The method of claim 44 further including establishing a shoulder position 
from the body center position, the body boundary position, the torso-divisioning plane, and the 
left arm or the right arm identification. 

46. (Original) The method of claim 45 wherein defining the dynamic object detection region 
includes determining position data for the hand detection region relative to the shoulder position. 

47. (Original) The method of claim 46 further including smoothing the position data for the 
hand detection region. 

48. (Original) The method of claim 45 further including: 

determining the position of the dynamic object detection region relative to the torso 
divisioning plane in the axis perpendicular to the torso divisioning plane; 

determining the position of the dynamic object detection region in the horizontal axis 
relative to the shoulder position; and 

determining the position of the dynamic object detection region in the vertical axis relative to an 
overall height of the user using the body boundary position. 
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49. (Original) The method of claim 36 wherein defining the dynamic object detection region 
includes: 

establishing the position of a top of the user's head using topmost feature pairs of the 
collection of features unless the topmost feature pairs are at the boundary; and 

determining the position of a hand detection region relative to the top of the user's head. 

50. (Currently Amended) A method of using stereo vision to interface with a computer, the 
method comprising: 

capturing a stereo image using a stereo camera; 

defining a region of interest within a field of view of the stereo image and smaller than 
the field of view; 

processing the stereo image to determine position information of an object in the region 
of interest with respect to the region of interest, the object being controlled by a user;[[;]] 

processing the stereo image to identify feature information, to produce a scene 
description from the feature information, and to identify matching pairs of features in the stereo 
image; 

calculating a disparity and a position for each matching feature pair to create the scene 
description; 

analyzing the scene description in a scene analysis process to determine position 
information of the object; 

clustering the feature information in the region of interest into clusters having a collection 
of features by comparison to neighboring feature information within a predefined range; 

calculating a position for each of the clusters; and 

using the position information allow the user to interact with a computer application. 
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5 1 . (Original) The method of claim 50 further including: 

mapping the position of the object from the feature information from camera coordinates 
to screen coordinates associated with the computer application; and 

using the mapped position to interface with the computer application. 

52. (Original) The method of claim 50 further including: 

recognizing a gesture associated with the object by analyzing changes in the position 
information of the object in the scene description; and 

combining the position information and the gesture to interface with the computer 
application. 

53. (Original) The method of claim 50 wherein the step of capturing the stereo image further 
includes capturing the stereo image using a stereo camera. 

54. (Currently amended) A stereo vision system for interfacing with an application program 
running on a computer, the stereo vision system comprising: 

first and second video cameras arranged in an adjacent configuration and operable to 
produce a series of stereo video images; and 

a processor operable to receive the series of stereo video images and detect objects 
appearing in an intersecting field of view of the cameras, the processor executing a process to: 
define an object detection region in three-dimensional coordinates relative to a 
position of the first and second video cameras; 

generate a scene description; 

select a control object as a cluster of features from the scene description appearing 
within the object detection region; and 

map position coordinates of the control object to a position indicator associated 
with the application program as the control object moves within the object detection region. 
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55. (Original) The stereo vision system of claim 54 wherein the process selects as a control 
object a detected object appearing closest to the video cameras and within the object detection 
region. 

56. (Original) The stereo vision system of claim 54 wherein the control object is a human 
hand. 

57. (Original) The stereo vision system of claim 54 wherein a horizontal position of the 
control object relative to the video cameras is mapped to an x-axis screen coordinate of the 
position indicator. 

58. (Original) The stereo vision system of claim 54 wherein a vertical position of the control 
object relative to the video cameras is mapped to a y-axis screen coordinate of the position 
indicator. 

59. (Original) The stereo vision system of claim 54 wherein the processor is configured to: 
map a horizontal position of the control object relative to the video cameras to a x-axis 

screen coordinate of the position indicator; 

map a vertical position of the control object relative to the video cameras to a y-axis 
screen coordinate of the position indicator; and 

emulate a mouse function using the combined x-axis and y-axis screen coordinates 
provided to the application program. 

60. (Original) The stereo vision system of claim 59 wherein the processor is further 
configured to emulate buttons of a mouse using gestures derived from the motion of the object 
position. 
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61. (Original) The stereo vision system of claim 59 wherein the processor is further 
configured to emulate buttons of a mouse based upon a sustained position of the control object in 
any position within the object detection region for a predetermined time period. 

62. (Original) The stereo vision system of claim 59 wherein the processor is further 
configured to emulate buttons of a mouse based upon a position of the position indicator being 
sustained within the bounds of an interactive display region for a predetermined time period. 

63. (Original) The stereo vision system of claim 54 wherein the processor is further 
configured to map a z-axis depth position of the control object relative to the video cameras to a 
virtual z-axis screen coordinate of the position indicator. 

64. (Original) The stereo vision system of claim 54 wherein the processor is further 
configured to: 

map a x-axis position of the control object relative to the video cameras to an x-axis 
screen coordinate of the position indicator; 

map a y-axis position of the control object relative to the video cameras to a y-axis screen 
coordinate of the position indicator; and 

map a z-axis depth position of the control object relative to the video cameras to a virtual 
z-axis screen coordinate of the position indicator. 

65 (Original) The stereo vision system of claim 64 wherein a position of the position 
indicator being within the bounds of an interactive display region triggers an action within the 
application program. 

66. (Original) The stereo vision system of claim 54 wherein movement of the control object 
along a z-axis depth position that covers a predetermined distance within a predetermined time 
period triggers a selection action within the application program. 
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67. (Original) The stereo vision system of claim 54 wherein a position of the control object 
being sustained in any position within the object detection region for a predetermined time 
period triggers a selection action within the application program. 

68. (Currently amended) A stereo vision system for interfacing with an application program 
running on a computer, the stereo vision system comprising: 

first and second video cameras arranged in an adjacent configuration and operable to 
produce a series of stereo video images; and 

a processor operable to receive the series of stereo video images and detect objects 
appearing in the intersecting field of view of the cameras, the processor executing a process to: 
define an object detection region in three-dimensional coordinates relative to a 
position of the first and second video cameras; 

generate a scene description; 

detect an object as a cluster of features within the scene description; 
select as a control object a detected object appearing closest to the video cameras 
and within the object detection region; 

define sub regions within the object detection region; 
identify a sub region occupied by the control object; 

associate with that sub region an action that is activated when the control object 
occupies that sub region; and 

apply the action to interface with a computer application. 

69. (Original) The stereo vision system of claim 68 wherein the action associated with the 
sub region is further defined to be an emulation of the activation of keys associated with a 
computer keyboard. 
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70. (Original) The stereo vision system of claim 68 wherein a position of the control object 
being sustained in any sub region for a predetermined time period triggers the action. 

71. (Currently amended) A stereo vision system for interfacing with an application program 
running on a computer, the stereo vision system comprising: 

first and second video cameras arranged in an adjacent configuration and operable to 
produce a series of stereo video images; and 

a processor operable to receive the series of stereo video images and detect objects 
appearing in an intersecting field of view of the cameras, the processor executing a process to: 

define a region within the intersecting field of view of the stereo image and 
smaller than the intersecting field of view; 

generate a scene description; 

detect an object as a cluster of features within the scene description; 
identify, with respect to the region, [[an]] a detected object perceived as the 
largest object appearing in the region and positioned at a predetermined depth range; 
select the object as an object of interest; 

determine a position coordinate representing a position of the object of interest; 

and 

use the position coordinate as an object control point to control the application 

program. 

72. (Original) The system of claim 71 wherein the process causes the processor to: 
determine and store a neutral control point position; 

map a coordinate of the object control point relative to the neutral control point position; 

and 



use the mapped object control point coordinate to control the application program. 
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73. (Previously presented) The system of claim 72 wherein the process causes the processor 
to: 

define the region having a position based upon the position of the neutral control point 
position; 

map the object control point relative to its position within the region; and 

use the mapped object control point coordinate to control the application program. 

74. (Original) The system of claim 72 wherein the process causes the processor to: 
transform the mapped object control point to a velocity function; 

determine a viewpoint associated with a virtual environment of the application program; 

and 

use the velocity function to move the viewpoint within the virtual environment. 

75. (Original) The system of claim 71 wherein the process causes the processor to map a 
coordinate of the object control point to control a position of an indicator within the application 
program. 

76. (Original) The system of claim 75 wherein the indicator is an avatar. 

77. (Original) The system of claim 71 wherein the process causes the processor to map a 
coordinate of the object control point to control an appearance of an indicator within the 
application program. 

78. (Original) The system of claim 77 wherein the indicator is an avatar. 

79. (Original) The system of claim 71 wherein the object of interest is a human appearing 
within the intersecting field of view. 



Applicant : Evan HILDRETH et al. Attorney's Docket No.: 12121-002001 

Serial No. : 09/909,857 

Filed : July 23, 2001 

Page : 16 of 23 



80. (Currently amended) A stereo vision system for interfacing with an application program 
running on a computer, the stereo vision system comprising: 

first and second video cameras arranged in an adjacent configuration and operable to 
produce a series of stereo video images; and 

a processor operable to receive the series of stereo video images and detect objects 
appearing in an intersecting field of view of the cameras, the processor executing a process to: 
generate a scene description; 

detect an object as a cluster of features within the scene description; 
identify [[an]] a detected object perceived as the largest object appearing in the 
intersecting field of view of the cameras and positioned at a predetermined depth range; 
select the object as an object of interest; 

define a control region between the cameras and the object of interest, the control 
region being positioned at a predetermined location and having a predetermined size relative to a 
size and a location of the object of interest; 

search the control region for a point associated with the object of interest that is 
closest to the cameras and within the control region; 

select the point associated with the object of interest as a control point if the point 
associated with the object of interest is within the control region; and 

map position coordinates of the control point, as the control point moves within 
the control region, to a position indicator associated with the application program. 

81. (Original) The system of claim 80 wherein the processor is operable to: 

map a horizontal position of the control point relative to the video cameras to an x-axis 
screen coordinate of the position indicator; 

map a vertical position of the control point relative to the video cameras to a y-axis 
screen coordinate of the position indicator; and 

emulate a mouse function using a combination of the x-axis and the y-axis screen 
coordinates. 
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82. (Original) The system of claim 80 wherein the processor is operable to: 

map a x-axis position of the control point relative to the video cameras to an x-axis screen 
coordinate of the position indicator; 

map a y-axis position of the control point relative to the video cameras to a y-axis screen 
coordinate of the position indicator; and 

map a z-axis depth position of the control point relative to the video cameras to a virtual 
z-axis screen coordinate of the position indicator. 

83. (Original) The system of claim 80 wherein the object of interest is a human appearing 
within the intersecting field of view. 

84. (Original) The system of claim 80 wherein the control point is associated with a human 
hand appearing within the control region. 

85. (Currently amended) A stereo vision system for interfacing with an application program 
running on a computer, the stereo vision system comprising: 

first and second video cameras arranged in an adjacent configuration and operable to 
produce a series of stereo video images; and 

a processor operable to receive the series of stereo video images and detect objects 
appearing in an intersecting field of view of the cameras, the processor executing a process to: 
define an object detection region in three-dimensional coordinates relative to a 
position of the first and second video cameras; 

generate a scene description of the intersecting field of view within the object 
detection region; 

detect an object as a cluster of features within the scene description; 
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select up to two hand objects from the objects appearing in th e int e rsecting fi e ld 
of vi e w that ar e within th e obj e ct det e ction r e gion clusters of features within the scene 
description ; and 

map position coordinates of the hand objects, as the hand objects move within the 
object detection region, to positions of virtual hands associated with an avatar rendered by the 
application program. 

86. (Original) The system of claim 85 wherein the process selects the up to two hand objects 
from the objects appearing in the intersecting field of view that are closest to the video cameras 
and within the object detection region. 

87. (Original) The system of claim 85 wherein the avatar takes the form of a human-like 
body. 

88. (Original) The system of claim 85 wherein the avatar is rendered in and interacts with a 
virtual environment forming part of the application program. 

89. (Original) The system of claim 88 wherein the processor further executes a process to 
compare the positions of the virtual hands associated with the avatar to positions of virtual 
objects within the virtual environment to enable a user to interact with the virtual objects within 
the virtual environment. 

90. (Original) The system of claim 85 wherein the processor further executes a process to: 
detect position coordinates of a user within the intersecting field of view; and 

map the position coordinates of the user to a virtual torso of the avatar rendered by the 
application program. 
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91 . (Original) The system of claim 85 wherein the process moves at least one of the virtual 
hands associated with the avatar to a neutral position if a corresponding hand object is not 
selected. 

92. (Original) The system of claim 85 wherein the processor further executes a process to: 
detect position coordinates of a user within the intersecting field of view; and 

map the position coordinates of the user to a velocity function that is applied to the avatar 
to enable the avatar to roam through a virtual environment rendered by the application program. 

93. (Original) The system of claim 92 wherein the velocity function includes a neutral 
position denoting zero velocity of the avatar. 

94. (Original) The system of claim 93 wherein the processor further executes a process to 
map the position coordinates of the user relative to the neutral position into torso coordinates 
associated with the avatar so that the avatar appears to lean. 

95. (Original) The system of claim 92 wherein the processor further executed a process to 
compare the position of the virtual hands associated with the avatar to positions of virtual objects 
within the virtual environment to enable the user to interact with the virtual objects while 
roaming through the virtual environment. 

96. (Original) The system of claim 85 wherein a virtual knee position associated with the 
avatar is derived by the application program and used to refine an appearance of the avatar. 

97. (Original) The system of claim 85 wherein a virtual elbow position associated with the 
avatar is derived by the application program and used to refine an appearance of the avatar. 
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98. (Original) The system of claim 85 further comprising a third video camera arranged in 
an adjacent configuration with the first and second video cameras and operable to produce the 
series of stereo video images. 

99. (New) A method of using computer vision to interface with a computer, the method 
comprising: 

generating a scene description that includes an indication of a three-dimensional position 
of a feature included in a scene; 

analyzing the scene description including the indication of the three-dimensional position 
of the feature to determine position information of an object within the scene; and 

using the position information to control a computer application. 

100. (New) The method of claim 99 wherein generating the scene description comprises 
generating the scene description from stereo images. 

101 . (New) The method of claim 99 wherein: 

generating a scene description comprises generating a scene description that includes an 
indication of a three-dimensional position of a feature included in a scene and an indication a 
shape of the feature; and 

analyzing the scene description comprises analyzing the scene description including the 
indication of the three-dimensional position of the feature and the indication of the shape of the 
feature to determine position information of an object. 



